Retrieving Collocations by Co-occurrences and Word Order Constraints
نویسندگان
چکیده
In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method retrieve collocations in the following stages: 1) extracting strings of characters as units of collocations 2) extracting recurrent combinations of strings in accordance with their word order in a corpus as collocations. Through the method, various range of collocations, especially domain specific collocations, are retrieved. The method is practical because it uses plain texts without any information dependent on a language such as lexical knowledge and parts of speech.
منابع مشابه
Retrieving Domain-Specific Collocations by Co-occurrences and Word Order Constraints
In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method comprises the following stages: (1) extracting strings of characters as units of collocations, and (2) extracting recurrent combinations of strings as collocations. Through this method, various types of domain-specific collocations can be retrieved simultaneously. This method is pr...
متن کاملRetrieving Collocations from Text: Xtract
Natural languages are full of collocations, recurrent combinations of words that co-occur more often than expected by chance and that correspond to arbitrary word usages. Recent work in lexicography indicates that collocations are pervasive in English; apparently, they are common in all types of writing, including both technical and nontechnical genres. Several approaches have been proposed to ...
متن کاملA Three-Layered Collocation Extraction Tool and Its Application in China English Studies
We design a three-layered collocation extraction tool by integrating syntactic and semantic knowledge and apply it in China English studies. The tool first extracts peripheral collocations in the frequency layer from dependency triples, then extracts semi-peripheral collocations in the syntactic layer by association measures, and last extracts core collocations in the semantic layer with a simi...
متن کاملRetrieving Collocations From Korean Text
This paper describes a statistical methodology ibr automatically retrieving collocations from POS tagged Korean text using interrupted bigrams. The free order of Korean makes it hard to identify collocations. We devised four statistics, 'frequency', 'randomness', 'condensation', and 'correlation' .to account for the more flexible word order properties of Korean collocations. We extracted meanin...
متن کاملWord Sense Disambiguation in Untagged Text based on Term Weight Learning
This paper describes unsupervised learning algorithm for disambiguating verbal word senses using term weight learning. In our method, collocations which characterise every sense are extracted using similarity-based estimation. For the results, term weight learning is performed. Parameters of term weighting are then estimated so as to maximise the collocations which characterise every sense and ...
متن کامل